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(54) A design environment for hardware/software co-design 



(57) The present invention is related to a design 
apparatus compiled on a computer environment for 
generating from a heterogeneous specification a heter- 
ogeneous implementation of a system, said system 
comprising at least one digital system part, said hetero- 
geneous specification comprising a number of para- 
digms having associated behavioural and structural 
languages, said heterogeneous implementation com- 
prising hardware parts and software parts, said appara- 
tus comprising: 

a database compiled on said computer environ- 
ment, adapted for access by executable programs 
on said computer environment for generating the 
implementation of said system, said database com- 
prising a plurality of objects representing aspects of 
said system wherein said objects comprise primi- 
tive objects representing the specification of said 
system and hierarchical objects being created by 
said executable programs while generating the 
implementation of said system, said hierarchical 
objects being refinements of said primitive objects 
and having more detail and preserving any one or 



all of said aspects to thereby generate said imple- 
mentation of said system; and said database fur- 
ther comprising relations in-between said primitive 
objects and in-between said hierarchical objects 
and between said primitive objects and said hierar- 
chical objects; and further comprising functions for 
manipulating said objects and said relations; 

means for simulating said system comprising a plu- 
rality of simulators for said behavioural and struc- 
tural languages; 

means for implementing said system comprising a 
plurality of compilers for said behavioural and struc- 
tural languages; means for allocating hardware 
components for an implementation of said system; 
means for assigning hardware subsystems and 
software subsystems to allocated hardware compo- 
nents; 

means for implementing the communication 
between said software subsystems and said hard- 
ware subsystems, one of the aspects of said com- 
munication being represented by ports; 
characterized in that said apparatus further com- 
prises: 
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means for creating processor models of said allo- 
cated hardware components as primitive objects in 
said database, said models comprising software 
models representing the software views on said 
hardware components and hardware models repre- 
senting the hardware views on said hardware com- 
ponents, 

means for specifying said system in said apparatus, 
the specification in said apparatus comprising proc- 
esses being primitive objects and comprising ports, 
said ports structuring the communication between 
said processes and being primitive objects, said 
process comprising a description of each of the dif- 
ferent parts of said system in a specific host lan- 
guage encapsulation, said encapsulation 



comprising threads and context, a plurality of 
behavioural and structural languages being sup- 
ported in said apparatus, and 

means for encapsulating said simulators, said compil- 
ers, said hardware components, said hardware subsys- 
tems and said software subsystems in said apparatus to 
thereby create a consistent communication between 
said encapsulation and said encapsulated simulators, 
compilers, hardware components, hardware subsys- 
tems and software subsystems ; thereby enabling the 
refinement process of said primitive objects while gen- 
erating the implementation of said system. 



21 



13 




20 




Co-Srmulation 



Co-Synme&is 



C2VHDL 


c«m>3 
LMDP 






ASBP . 

cc 


ARM 

cc 






Encapsulated Component Compilers 



; SYMPHONY: HW/SW & HW/HW Interlace Synthesis - 



-..11 



VHOL 



14 



15 



16 



12 



17 



18 



19 



FIG « 2 



1 



EP1 065 611 A2 



2 



Description 

Field of the invention 

[0001] The present invention relates to a design 5 
environment for hardware/software co-design. More 
specifically the hardware/software co-design of this 
invention comprises the specification, synthesis, and 
simulation of heterogeneous systems. 

10 

Background of the invention 

[0002] Digital communication techniques form the 
basis of the rapid breakthrough of modern consumer 
electronics, wireless and wired voice- and data network- 15 
ing products, broadband networks and multi-media 
applications. Such products are based on digital com- 
munication systems, which are made possible by the 
combination of VLSI technology and Digital Signal 
Processing. 20 
[0003] Digital systems perform real-time transfor- 
mations on time discrete digitized samples of analogue 
quantities with finite bandwidth. and signal to noise ratio. 
These transformations can be specified in programming _ 
languages and executed on a programmable processor 25 
or directly on application specific hardware. The choice 
is determined by trade-offs between cost, performance, 
power and flexibility. 

[0004] Hence digital systems are a candidate par 
excellence for hardware-software co-design. 30 
[0005] In contrast to analogue processing, digital 
processing guarantees perfect reproducibility, storage 
and testability. Signal quality is a matter of exact mathe- 
matical operations. The price paid is the cost of hard- 
ware and the performance needed to satisfy the hard 35 
real-time character. This problem is now solved by the 
abundance of digital VLSI (Very Large Scale Integra- 
tion) technology which provides for cheap storage and 
high speed computation. Therefore, the combination of 
VLSI technology and digital processing has made pos- 40 
sible the breakthrough of modern consumer electronics, 
portable and personal communication, broadband net- 
works, multi-media, and automotive applications. 
[0006] The design process of the products for these 
applications is subject to a number of constraints. A first as 
constraint is that they must be implemented in silicon or 
another hardware platform for power, performance and 
cost reasons. A second constraint is that these products 
implement systems conceived by a highly specialized 
system team thinking in terms of executable concurrent so 
programming paradigms which, today, are not well 
understood by hardware designers. Hence most speci- 
fications are first translated into English and then rede- 
signed in a specific hardware description language such 
as VHDL or VERILOG for the hardware components 55 
and a software description language such as C or 
assembler for the software components. Although the 
hardware and software have tight interaction, both hard- 



ware and software are designed separately. Only after 
system assembly, the software and hardware are run 
together. As a consequence, the design can be far from 
optimal or even erroneous, making a redesign cycle 
mandatory. This gap between system design and imple- 
mentation is rapidly becoming the most important bot- 
tleneck in the design process of said products or said 
systems. Another constraint is that for reasons of cost- 
effectiveness and time-to-market, there is a need to 
increase design productivity by at least an order of mag- 
nitude. Yet another constraint is that re-use of designs 
as well as a design for re-use methodology will have to 
be adopted. Said methodology implies hardware/soft- 
ware co-design at several levels of implementation. 
[0007] J. Buck et al. in "PTOLEMY: A framework for 
simulating and prototyping heterogeneous systems" 
(International Journal on Computer Simulation, January 
1994) focus on an environment for hardware/software 
co-simulation. The proposed methodology does only 
allow for hardware/software co-design of systems 
based on a Data-Flow algorithm. Furthermore hard- 
ware/software interface synthesis is not supported: 
[0008] United States Patent No/ 5,197,016 dis- 
closes a computer-aided system and method for 
designing an application specific integrated circuit 
whose intended function is implemented both by hard- 
ware subsystems and software subsystems. The pro- 
posed methodology only allows for a single processor 
design and is only valid for specifications based on a 
state transition diagram. The hardware/software co- 
design of systems based on a heterogeneous specifica- 
tion is not supported. 

[0009] S. Narayan, F. Vahid, and D. Gajski. in "Sys- 
tem specification with the SpecCharts language" (IEEE 
Design & Test of Computers, pages 6-13, December 
1992) disclose a methodology that builds on VHDL. The 
methodology does not support the hardware/software 
co-design of systems based on a heterogeneous speci- 
fication. 

[0010] P. Chou, R. Ortega, and G. Borriello. in "Syn- 
thesis of the hardware/software interface in microcon- 
troller-based systems" (Proceedings of the IEEE 
International Conference on Computer- Aided Design, 
ICCAD 92, pages 488 - 495, November 1 992) show a 
method for hardware/software interface generation for 
microcontroller based systems. Said method assumes 
that the user determines the software interfacing such 
as the communication with drivers before the start of the 
system synthesis task. 

[001 1 ] The prior art references, /sma/7 and Jerraya: 
"Synthesis Steps and Design Models for Codesian*. 
published in Computer, February 1995 and Valderrama 
et al: "A Unified Model for Co-Simulation and Co-Syn- 
thesis of Mixed Hardware/Software Systems" . Proceed- 
ings of the European Design and Test Conference, 
Paris, March 1995, describe an environment for hard- 
ware-software co-design. A unified co-synthesis and 
co-simulation methodology is described therein. The 
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environment also allows to accommodate several archi- 
tectural models through the use of a library of communi- 
cation models enabling the abstraction of existing 
communication schemes. The same module descrip- 
tions are usable with different architectures in terms of 
their underlying communication protocols. 
[0012] A first problem with the SOLAR/COSMOS 
design environment as referred to in the documents is 
that it does not support the combination of different 
specification and description languages. It provides a 
unified model (SOLAR) in which several specification 
languages can be translated. The selection of the uni- 
fied model (SOLAR) implies restrictions on the specifi- 
cation languages that can be used as input for co- 
design and co-synthesis of a heterogeneous system, 
Only those specification languages, the model of which 
corresponds to the SOLAR model (an extended Finite 
State Machine Model) can be supported by the design 
environment as described in the prior art references. At 
this moment, only SDL is supported by SOLAR. The C 
and VHDL paradigm is partially incompatible with 
SOLAR and only limited specification descriptions can 
be supported in SOLAR. 

[0013] . The design environment as described in 
documents therefore does not allow a heterogeneous 
system specification (i.e. one that uses different lan- 
guages or "paradigms" for different system compo- 
nents) as input to co-design and co-synthesis. In the 
design process from the SOLAR model, different code 
paradigms such as C and VHDL code can be gener- 
ated. 

[001 4] A second problem in the design environment 
for hardware/software co-design as described in the 
prior art references is the fact that the SOLAR/COS- 
MOS method can only deal with fixed hardware/soft- 
ware target architectures. It can deal with various types 
of hardware/software target architectures as long as 
these are known in advance of the design process is as 
follows. Any hardware/software target architecture that 
serves as a target for SOLAR/COSMOS has to be 
known in advance (blocks, interconnections, drivers,...). 
[0015] In particular, the design methodology as 
implemented in the SOLAR/COSMOS environment 
makes use of a library of communication models. A cor- 
responding library of communication units is also 
included. 

[0016] In SOLAR/COSMOS, at a high level of 
abstraction, the communication between the different 
system components is modelled as a logical communi- 
cation network providing a set of services. More specif- 
ically, the communication occurs "via logical channel 
units (the communication models). The communication 
synthesis (design) problem is then formulated as: 

1. Fixing the number of communication units 
(selected from a library) that will implement the 
functionality of the communication network. 

2. Assigning logical channel units to allocated com- 



munication units. 

[0017] As a consequence, a communication unit 
must be available in the library for each possible com- 
5 munication between the systems components of the 
final design implementation (i.e. for every combination 
of processors). 

[0018] A communication unit, namely, is an object 
that can execute one or several procedures that may 

w share some common resource(s) (see D2, p.181). The 
sharing of the common resource(s) is controlled by a 
communication controller. A communication unit may 
correspond to either an existing communication plat- 
form or to a subsystem resulting from an earlier design 

15 session (D2, p. 181). In both cases, the hardware/soft- 
ware target architecture of the final design implementa- 
tion is already in place, it just needs to be used. The 
communication units available are placed into a library 
of components and are not synthesised for each design 

20 session (p. 181). 

[0019] These communication units in fact are real 
I/O processors. They provide an amount of functionality 
that can be used to implement the communication 
required by the logical channel units assigned to them, 

25 but they are always instantiated (allocated) as a whole, 
leading to sub-optimality and no customised, flexible 
design solution can be generated. 
[0020] Thus, the SOLAR/COSMOS can only sup- 
port a number of pre-defined hardware/software target 

30 architectures. Which part of the hardware/software tar- 
get architecture will actually be used during the commu- 
nication in a specific application will be different from 
application to application. Most applications, however, 
will not use all of the I/O mechanisms available in such 

35 a pre-defined architecture, and therefore, sub-optimal 
design implementations will result. 
[0021] Neither of the prior art solutions provides a 
design environment based on a data-model that allows 
to specify, simulate and implement or synthesize heter- 

40 ogeneous hardware/software implementations starting 
from a heterogeneous system specification. In the fol- 
lowing paragraphs of this section an analysis is made of 
the characteristics of specifications of such heterogene- 
ous systems. 

45 

Problem Definition 

[0022] In the strictest sense digital systems are 
algorithms mapping digital signals into digital signals in 

50 real-time. The real-time constraint is determined by the 
repetition period of the algorithm for consuming an input 
frame and producing a new output frame. 
[0023] The periodicity of this constraint and the 
nature of the signals leads to the fact that the elemen- 

55 tary algorithm is a data-flow function. 

[0024] A Synchronous Data-Flow (SDF) algorithm 
can be modeled as an acyclic graph where nodes are 
operators and edges represent data-precedences. This 
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graph is called a data-flow graph. An operator can exe- 
cute when a predetermined, fixed number of data 
(tokens) are present at its input and then it produces 
one or more output data (tokens). 
[0025] Conditional selection of two operations to a 5 
single output is allowed. Operators have no state but 
arrays resulting from one algorithm execution can be 
saved for reuse in future executions (delay operator). 
Many digital processing algorithms are of this type. 
They can be described very efficiently by so called 10 
applicative programming languages like SILAGE. 
[0026] In contrast, dynamic data-flow (DDF) algo- 
rithms contain data-dependent token production and 
consumption. They allow for while and if -then-else con- 
structs. 75 
[0027] Computer-Aided Design (CAD) environ- 
ments for digital systems such as DSP-Station of Men- 
tor Graphics, PTOLEMY, GRAPE-II of COSSAP all 
allow for specification of SDF and DDF and use as 
much as possible static scheduling to provide simulation 20 
speeds that are up to two orders of magnitude faster 
than event driven simulators such as in use for VHDL. 
This justifies the use of these simulation paradigms for 
digital system specification and validation. 
[0028] However, when we consider digital process- 25 
ing systems in the broad sense, a wider scope is neces- 
sary as illustrated in FIGURE 1 which is an abstraction 
of many practical implementations of digital processing 
systems. A careful look at FIGURE 1 allows us to iden- 
tify five (in the sequel 1 ) 2) 3) 4) 5)) common character- 30 
istics of digital processing system specifications: 

1. Digital systems typically comprise one (or more) 
signal paths 1 as well as slow control loops 2 and a 
reactive control system 3 taking events 4 of a slow 35 
environment such as a user interface (Ul) 5 and 
slow status information 6 of the signal paths as 
inputs to control the mode or parameters of the sig- 
nal paths. 

2. A signal path 1 is usually a concatenation of 40 
data-flow functional (DFB) blocks 7, such as hi, 
h2,..., L2, often operating at fairly different data- and 
execution-rates and transforming the format of the 
data. The rate and format differences naturally 
result from operations such as: frequency down- or 45 
up-conversion, bit to symbol modulation, data-com- 
pression and error correction coding. When these 
DFBs operate on unfragmented signal words, they 
can best be specified as data-flow algorithms (e.g. 

in SILAGE, DFL, orC). Others that manipulate indi- 50 
vidual bits of the signals can be directly specified as 
Finite State Machines with Data paths (FSMD) at 
VHDL register transfer or behavioral level. Hence 
the specification format depends on the type of 
data-flow functional block. 55 

3. DFBs in the signal path are internally strongly 
interconnected data-flow graphs with sparse exter- 
nal communication. Hence, from an implementation 



viewpoint, they are seldom partitioned over several 
hardware or software components. Rather they will 
be merged onto the same component if throughput 
and rate constraints allow to do so. Merging implies 
sequentializing the concurrent processes on a sin- 
gle component while still satisfying the timing con- 
straints. This requires software synthesis 
encapsulation techniques of single thread compil- 
ers in order to allow real-time scheduling of concur- 
rent processes. 

4. Control loops and mode control by parameter 
setting are common to almost all digital processing 
systems. For example, ail digital communication 
systems have tracking and acquisition loops in 
order to synchronise frequency and phase of the 
receiver signal path to the characteristics of the 
incoming signal. Design of these loops is one of the 
most difficult tasks since their characteristics 
depend strongly on noise and distortion properties 
of the communication channel. It involves the 
design of phase-locked loops, delay-locked loops, 
and fast Fourier transforms, controlled by "events" 
disturbing the regularity of the signal streams. The 
occurrence rate of these events is orders of magni- 
tude slower than the data-rate in the signal path. 
Hence, similar to the Ul, the processes modeling 
these slow control loops and mode setting have no 
data-flow but reactive semantics. They run concur- 
rently with the data-flow and often consist them- 
selves of concurrent processes. Such a control 
dominated system can be described as a Program 
State Machine (PSM), which is a hierarchy of pro- 
gram-states, in which each program -state repre- 
sents a distinct mode of computation. Formalisms 
such as StateCharts or SpecCharts, which include 
behavioral hierarchy, exception handling and inter- 
process communication modeling are needed to 
describe such systems. In practice, very often syn- 
chronization is specified in one or more concurrent 
C programs. 

5. Digital systems contain both high and low data- 
rate blocks in the signal path. High data-rate blocks 
are synthesised directly in hardware. Low data- rate 
blocks are candidates for implementation on pro- 
grammable processors. Hence digital systems are 
natural candidates for hardware/software co- 
design. 

[0029] From the above it follows that digital systems 
require a combination of data-models for their specifica- 
tion. Specification languages are tightly coupled to 
these data-models, paradigms, simulators, and synthe- 
sis tools. 

[0030] Nowadays, the dominant specification lan- 
guage of the digital system designer is C or a DFL for 
the main signal path whereas FSMDs and PSMs are 
usually described in a HDL. For the description of com- 
munication channels and communication protocols 
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other formalisms such as timing diagrams, Extended 
Signal Transition Graphs, and Communicating Sequen- 
tial Processes must be considered. A CAD system for 
digital systems must be able to encapsulate all these 
paradigms and there associated languages and design 
environments. 

[0031] Digital systems design thus requires the abil- 
ity to mix data-flow and reactive paradigms with widely 
different time constants. The difference in time con- 
stants between control- and data-flow poses special 
problems in simulation. It requires all processes to be 
simulatable at the highest possible abstraction level. 
[0032] Not only the specification of a digital system 
is heterogeneous by nature. Also the implementation 
architecture of a digital system is heterogeneous. An 
example implementation architecture comprises the fol- 
lowing types of components and the communication 
between these components: 

programmable processors, 
application specific processors with hardwired con- 
troller, 

application specific processors with specialized 
instruction set, 
hardware accelerators, 
micro controllers, 

communication blocks and memory, 
- peripherals (DMA, UART,...). 

[0033] Thus, a design method for a digital system 
must bridge the gap between the heterogeneous speci- 
fication of the system and its heterogeneous implemen- 
tation. Today's synthesis tools and compilers allow us to 
synthesize or program all the processor-accelerator- 
memory components once the global system architec- 
ture has been defined. However, the availability of these 
component compilers is necessary, but not sufficient. 
What is needed are the models and tools to refine the 
functional specification of a system into the detailed 
architecture: the definition and allocation of the compo- 
nents and their communication and synchronization. 
The most essential step is to generate the necessary 
software and hardware to make processors, accelera- 
tors, and the environment communicate. 
[0034] One of the keys to mastering the complexity 
of digital system design is the reuse of components. 
The design process for a digital system must allow the 
modeling of reusable components and support a design 
for reuse methodology which allows to design compo- 
nents that are easily reusable. The problem in reusing 
previously designed components lies in the fixed com- 
munication protocols they use, which necessitates pro- 
tocol conversions when processors with different 
protocols have to be interfaced. Nowadays, the selec- 
tion of a protocol is done while designing the compo- 
nent: functional and communication behavior are 
intrinsically mixed. However, a good selection of the 
protocol is possible only when all components involved 



in the communication are known. Therefore, a design 
environment for digital systems has to allow that a com- 
ponent is initially described purely functional. Later, 
when the component is (re)used in a system, the design 

5 environment must allow to plug in the most appropriate 
communication behavior. This approach is in contrast 
with current hardware (VHDL) design practices, where 
communication and functional behavior are mixed. 
[0035] Another key to mastering the complexity of 

w digital system design is by means of modularity. In mod- 
ular designs, the complete system functionality is split 
into communicating components of manageable com- 
plexity. The advantage of this approach is that the com- 
ponents can be reused and that the system is easier to 

15 adapt and maintain. 

[0036] The disadvantage is the overhead because 
of the inter-component, communication or because the 
compiler does not optimize over the component bound- 
aries. Therefore, the inter-component communication 

20 semantics should be such that modularity can be 
removed easily when merging two components into a 
single component. 

[0037] In the past, a lot of effort has been put in 
design environments that allow to implement the com- 
25 ponents of a digital system. 

[0038] Languages with associated simulators, 
tuned towards specific application domains, allow to 
specify and simulate components at a high abstraction 
level. Hardware compilers can implement the compo- 
30 nent description into processors with highly specialized 
architectures. Software compilers allow to generate 
machine code for off-the-shelf programmable proces- 
sors. Instruction set simulators allow to debug the 
machine code at different levels of abstraction (C, asm). 
35 Examples of such design environments are Cathedral- 
1/2/3, the ARM processor tool suite (C-compiler and the 
ARMulator), and the Synopsys synthesis tools. From 
the above it can be concluded that the components of 
digital systems can be implemented with off-the-shelf 
40 design environments. What is missing is the glue that 
links these design environments together and automat- 
ically interfaces the generated or off-the-shelf proces- 
sors according to the system specification. Hence, a 
system design environment should allow to include 
45 existing design environments easily. It should provide 
synthesis tools for hardware/hardware and hard- 
ware/software interfacing that are processor and design 
environment independent. To achieve this, the specifi- 
cation method must allow to model off-the-shelf compo- 
se nents on an as-is basis. 

[0039] In summary, the following requirements can 
be defined for a hardware/software system design envi- 
ronment: 

55 - Modularity being essential to master complexity, but 
the overhead should be minimal and removable. 
Different description languages are needed to allow 
each system component to be described with the 
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most appropriate paradigm. 
The design environment must be able to model the 
heterogeneous conceptual specification, the result- 
ing heterogeneous architecture and all refinement 
steps in between. 

Off-the-shelf components and the associated 
design environments need to be modeled. 
A clear separation between functional and commu- 
nication behavior is required to allow to reuse 
designs. 

- Processor independent interface synthesis is 
essential. 

Summary of the invention 

[0040] A design environment meeting above-stated 
requirements for a hardware/software system co-design 
environment is disclosed in the present application. A 
hardware/software co-design environment based on a 
data-model that allows to specify simulate, and synthe- 
size heterogeneous hardware/software architectures 
from a heterogeneous specification is disclosed. Said 
environment is based on the principle of encapsulation 
of existing hardware and software compilers and allow 
for the interactive synthesis of hardware/software and 
hardware/hardware interfaces. 
[0041 ] It is an object of the present invention to dis- 
close a design environment for implementing a hetero- 
geneous essentially digital system. Said system 
comprises a database compiled on a memory structure 
adapted for access by executable programs on a com- 
puter for generating the implementation of said hetero- 
geneous essentially digital system, comprising a 
plurality of objects representing aspects of said digital 
system wherein said objects comprise primitive objects 
representing the specification of said digital system and 
hierarchical objects being created by said executable 
programs while generating the implementation of said 
digital system, said hierarchical objects being refine- 
ments of said primitive objects and having more detail 
and preserving any one or all of said aspects to thereby 
generate said implementation of said digital system; 
and further comprising relations inbetween said primi- 
tive objects and inbetween said hierarchical objects and 
between said primitive objects and said hierarchical 
objects; and further comprising functions for manipulat- 
ing said objects and said relations. Said system further 
comprises means for specifying said heterogeneous 
digital system comprising a plurality of behavioral and 
structural languages; means for simulating said hetero- 
geneous digital system comprising a plurality of simula- 
tors for said behavioral and structural languages; 
means for implementing said heterogeneous digital sys- 
tem comprising a plurality of compilers for said behavio- 
ral and structural languages; means for allocating 
hardware components for an implementation of said 
heterogeneous digital system; means for assigning 
hardware subsystems and software subsystems of said 



heterogeneous digital system to said hardware compo- 
nents; means for implementing the communication 
between said software subsystems and said hardware 
subsystems, one of the aspects of said communication 

5 being represented by ports; means for encapsulating 
said simulators, said compilers, said hardware compo- 
nents, said hardware subsystems and said software 
subsystems whereby creating a consistent communica- 
tion between said encapsulated simulators, compilers, 

10 hardware components, hardware subsystems and soft- 
ware subsystems; and means for creating processor 
models of said hardware components as objects in said 
database, said models comprising software models rep- 
resenting the software views on said hardware compo- 

75 nents and hardware models representing the hardware 
views on said hardware components. 
[0042] In an aspect of the present invention, the 
design environment further comprises means for creat- 
ing I/O scenario models of said ports as objects in said 

20 database, said I/O scenario models representing the 
implementation of said ports on said hardware compo- 
nents, said implementation comprising software sub- 
systems, hardware subsystems, and processor models 
with connections therebetween. 

25 [0043] In another aspect of the present invention, 
the implementation of the communication between a 
first software subsystem and a first hardware subsys- 
tem in the design environment results in said first soft- 
ware subsystem with a first port being replaced by a 

30 second hardware subsystem with a second port, said 
first port and said second port representing an essen- 
tially identical communication. 
[0044] Yet in another aspect of the present inven- 
tion, further comprise means for selecting I/O scenario 

35 models for the ports of said first software subsystem; 
means for combining the software subsystems of said 
selected I/O scenarios; means for combining the hard- 
ware subsystems of said selected I/O scenarios. 
[0045] In the design environment, a first I/O sce- 

40 nario model can represent the connection of said first 
port to said second port, said connection comprising a 
connection of said first port to said software subsystems 
of said I/O scenario model, connections of said software 
subsystems of said I/O scenario model to said software 

45 model, connections of said hardware model to said 
hardware subsystems of said I/O scenario model, and a 
connection of said hardware subsystems of said I/O 
scenario model to said second port. 
[0046] In the design environment, the I/O scenario 

so models can further comprise memory mapped I/O sce- 
narios, instruction programmed I/O scenarios, and inter- 
rupt based I/O scenarios. 

[0047] In an aspect of the present invention, said 
implementation is a simulation of said digital system. 
55 Said simulation can be a multi-abstraction level simula- 
tion, said multi -abstraction level simulation comprising 
substantially simultaneous low-level and high-level sim- 
ulation. 
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[0048] Said simulation can be a multi -platform sim- 
ulation being executed on a plurality of computers. 
[0049] Said simulation can be a hybrid simulation 
comprising substantially simultaneous hardware imple- 
mentations and computer simulations. 5 
[0050] Said implementation can be a heterogene- 
ous implementation comprising hardware subsystems 
and software subsystems, said software subsystems 
being executed on one or more of said hardware sub- 
systems. 10 
[0051] Said hardware subsystems can comprise 
any one or more of processor cores, off-the-shelf com- 
ponents, custom components, ASICs, processors, or 
boards. 

[0052] Said software subsystems can comprise 15 
machine instructions for said hardware subsystems. 

Short description of the drawings 

[0053] 20 

FIGURE 1 is a schematic representation of a heter- 
ogeneous digital system comprising various speci- 
fication paradigms. 

FIGURE 2 is a flowchart representing the method 25 
for hardware/software co-design of the present - 
invention. 

FIGURE 3 is an illustration of the primitive objects 
in the database and the relations in between said 
primitive objects. 30 
FIGURE 4 is an illustration of the hierarchical 
objects in the database and the relations in 
between said hierarchical objects and between the 
primitive and hierarchical objects. 
FIGURE 5 is an illustration of the process merge 35 
transformation. 

FIGURE 6 is a flowchart of a specific embodiment 
of the implementation process for hardware/soft- 
ware co-design. 

FIGURE 7 is a schematic representation of the 40 
functionality of the hardware/software interface 
generation. 

FIGURE 8 is a schematic representation of a partic- 
ular I/O scenario modeled in the database. 
FIGURE 9 is a flowchart of a specific embodiment 45 
of the construction of a hardware/software co-simu- 
lation. 

FIGURE 10 is a block diagram of a typical hetero- 
geneous digital system: the pager application. 
FIGURE 11 is a schematic representation of the 50 
pager application as described with the present 
invention. 

FIGURE 12 is a block diagram of the pager after 
application of the process merge transformation. 
FIGURE 1 3 is a block diagram of the pager after the 55 
communication channels have been tagged with a 
specific communication behavior. 
FIGURE 14 is an illustration of the introduction of 
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specific communication behavior in the pager appli- 
cation by refining a primitive channel into a hierar- 
chical channel. 

FIGURE 15 is an illustration of the implementation 
of a process in hardware, whereby the resulting 
hardware subsystems are encapsulated to make 
them communicate. 

FIGURE 16 shows the details of the encapsulation 
of the hardware subsystems in this particular appli- 
cation. 

FIGURE 17 is an illustration of the generation of a 
hardware/software interface between a software 
subsystem compiled on an ARM processor core 
and a hardware subsystem. 

Detailed de scription of the invention 

[0054] In the sequel a design environment and a 
design methodology meeting the requirements of mod- 
ularity, encapsulation of different description languages, 
modeling from a heterogeneous conceptual specifica- 
tion to a resulting heterogeneous architecture and all 
refinement steps inbetween, modeling capabilities for 
off-the-shelf components and the associated design 
environments, separation between functional and com- 
munication behavior and processor independent inter- 
face synthesis is disclosed. Said design environment in 
the sequel is called CoWare. 

[0055] In the sequel it is to be understood that the 
concept refinement means converting or translating or 
transforming a specification of an electronic system into 
an implementation. Said implementation can be an 
architecture of components that has the same behavior 
as said specification or that executes said specification. 
Said implementation can also be an added develop- 
ment in the chain leading to a final implementation. Add- 
ing detail means that an implementation is made more 
specific or concrete as a result of an implementation 
decision on a previous level in the chain leading to a 
final implementation. To detail can also mean adding a 
material object such as a specific component or a spe- 
cific communication inbetween components, as 
opposed to an abstract aspect on a previous level in the 
chain leading to a final implementation. Other instances 
of the concept refinement and of the concept detail are 
to be found in the sequel. 

[0056] FIGURE 2 shows the architecture of the 
CoWare system. 

[0057] The CoWare system supports four major 
design activities: co-specification 8, co-simulation 9, co- 
synthesis 10 and interface synthesis 1 1 . The input is a 
heterogeneous specification of an electronic system, 
the output 12 is a netlist for prior-art commercial tools 
for the generation of the implementation layout. Said 
output preferably comprises structural VHDL or Verilog 
and machine code for the programmable processors. 
[0058] The CoWare design environment is imple- 
mented on top of a data model in which modularity is 
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provided by means of processes. Processes contain 
host language encapsulations which are used to 
describe the system components. Communication 
between processes takes place through a behavioral 
interface comprising ports. For two processes to be able 
to communicate, their ports must be connected with a 
channel. The inter-process communication semantics is 
based on the concept of the Remote Procedure Call 
(RPC). The data model is hierarchically structured and 
allows to refine channels, ports, and protocols into lower 
level objects, adding detail. We refer to the most 
abstract object as a primitive object. An object that con- 
tains more implementation detail, is referred to as a 
hierarchical object. 

[0059] We first discuss the primitive objects. The 
hierarchical objects are used to refine the communica- 
tion behavior of the system and are discussed after- 
wards. 

[0060] A process is a container for a number of host 
language encapsulations of a component. A single 
process can have multiple host language encapsula- 
tions describing different implementations for the same 
component, or for the same component represented at 
different abstraction levels. 

[0061] A host language encapsulation describes a 
component in a specific host language. Preferably C, 
C++, DFL, VHDL and Verilog are supported host lan- 
guages. A CoWare language encapsulation is used to 
describe the system's structure. In a CoWare language 
encapsulation, one can instantiate processes and con- 
nect their ports with channels. 

[0062] Other host language encapsulations com- 
prise context and a number of threads. The context and 
thread contain code written in the host language of the 
encapsulation. The context contains code that is com- 
mon to all threads in the encapsulation, i.e. varia- 
bles/signals and functions as allowed by the semantics 
of the host language. As such the context provides for 
inter-thread (intra-process) communication. 
[0063] Each primitive CoWare process (symbolized 
by an ellipse 13 in FIGURE 2) encapsulates concurrent 
program threads in a host language of choice. Concur- 
rent threads communicate over shared memory inside a 
process. Inter-process communication is over uni-direc- 
tional channels using a Remote Procedural Call (RPC) 
protocol. The reasons of this choice will be explained 
below. 

[0064] Notice that in this way heterogeneous speci- 
fication is supported: both hardware and software 
aspects, structural and behavioral aspects, and different 
specification paradigms (data-flow, control-flow,...) can 
be combined. 

[0065] Co-specification allows to describe a func- 
tional specification based on the concept of communi- 
cating CoWare processes. 

[0066] An important concept of CoWare is that basi- 
cally no distinction is made between co-simulation and 
co-synthesis. Both are based on the concept of refining 



the specification for implementation, re-using existing 
compilers, emulators, and simulation processes. 
[0067] In refinement for co-synthesis the designer 
performs an interactive coarse partitioning of the speci- 

5 fications over a user allocated architecture. This leads 
to a merger of component compiler consistent proc- 
esses to be mapped on the same component. Compo- 
nent compiler consistent processes have an 
encapsulation in the same host language. Merging con- 

w sists of in-lining the RPC calls between said processes 
and leads to two subproblems: the mapping of the con- 
current threads in the processes on a processor re- 
. using existing component compilers 14, 15, 16, 17, 18, 
19 and the refinement of the communication between 

15 processes into hardware and software communication 
protocols that implements it. The implementation of 
concurrent threads and intra-process communication 
must be taken care of by using Real-Time Operating 
Systems (RTOS), micro-kernels or software synthesis 

20 in case of programmable processors or by providing a 
library based communication protocol shell around the 
existing hardware synthesis tools. Refinement of the 
inter-process communication means again a refinement 
of the primitive RPC communication by expanding the 

25 communication ports into implementable protocols 
available in a protocol library 20. It is also possible to 
assign channel processes to abstract channels. 
[0068] In principle all of this is open to the user who 
can add his own library for communication protocols. On 

30 the other hand CoWare provides in the SYMPHONY 
toolbox 21 a methodology for interface synthesis 
whereby every communication channel is refined by 
selection of a communication scenario. In this way auto- 
mated synthesis of hardware/hardware and hard- 

35 ware/software interfaces, including the generation of the 
software drivers in programmable processors is possi- 
ble. This is an essential part of hardware/software co- 
design. 

[0069] After the compilation of all components, all 
40 hardware is available as structural VHDL and all soft- 
ware for the processors is in C which can be compiled 
on the host compiler of the programmable components. 
The final step is to link ail the synthesis and hardware 
descriptions to drive commercial back-end tools to gen- 
45 erate layout. 

[0070] In FIGURE 3, the processes system 22 and 
subsystem 23 contain a CoWare language encapsula- 
tion. The CoWare language encapsulation of system 22 
describes how it is built up from an instance of subsys- 
so tern 23 and an instance of P4 (24). The processes P1 
(25), P23 (26), and P4 (24) each contain a C language 
encapsulation. 

[0071] Ports are objects through which processes 
communicate. A primitive port is characterized by a pro- 
55 tocol and a data type parameter, 

[0072] There is one implicit port, the construct port, 
to which an RPC is performed exactly once at system 
start-up. 
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[0073] In FIGURE 3 the process P23 (26) has two 
primitive ports p2 (27) and p3 (28), next to the implicit 
construct port. 

[0074] Protocols define the communication seman- 
tics of a port. A primitive protocol is one of master, 5 
inmaster, outmaster, inoutmaster, slave, inslave, out- 
slave, inoutslave. Each primitive protocol indicates 
another way of data transport. The in, out, and inout 
prefix indicates the direction of the data. The master, 
slave postfix indicates the direction of the control: 10 
whether the protocol activates an RPC (master) or serv- 
ices an RPC (slave). In the remainder of this text, ports 
with a slave/master protocol are usually referred to as 
slave/master ports. 

[0075] In FIGURE 3 master ports (29, 30) are rep- 15 
resented by the small shaded rectangles on a process' 
perimeter. Slave ports (27, 28) are represented by small 
open rectangles on the perimeter. The data direction of 
a port is represented by the arrow that connects to a 
port. In FIGURE 3 port p1 (29) is an outmaster port and 20 
port p2 (27) is an inslave port. 

[0076] A protocol may further have an index set. 
The indices in the index set are used to convey extra 
information about the data that is transported. For 
example the primitive protocol used to model the mem- 25 
ory port of a processor will have an index to model the 
address of the data that is put on the memory port. 
[0077] A thread is a single flow of control within a 
process. A thread contains code in the host language of 
the encapsulation of which the thread is a part. The 30 
code in a thread is executed according to the semantics 
of the host language. We distinguish between slave 
threads and autonomous threads. 
[0078] Slave threads are uniquely associated to 
slave ports and their code is executed when the slave 35 
port is activated (i.e. when an RPC is performed to the 
slave port). There is one special slave thread which is 
associated to the implicit construct port and can be 
used to initialize the process. 

[0079] In FIGURE 3 the process P23 (26) contains 40 
two regular slave threads (31, 32) associated to the 
slave ports p2 (27) and p3 (28) respectively, next to the 
special construct slave thread (33). 
[0080] Autonomous threads are not associated to 
any port and their code is executed, after system initial- 45 
ization, in an infinite loop. 

[0081] In FIGURE 3 processes P1 (25) and P4 (24) 
each contain an autonomous thread (34). 
[0082] A language encapsulation can contain multi- 
ple slave and autonomous threads that in principle, all so 
execute concurrently. 

[0083] A channel is a point-to-point connection of a 
master port and a slave port. Two ports that are con- 
nected by a channel can exchange data. Channels can 
be uni- or bi-directional. A primitive channel provides for 55 
unbuffered communication. It has no behavior: it is a 
medium for data transport. In hardware it is imple- 
mented with wires. In software it is implemented with a 



(possibly in-lined) function call. In this way, primitive 
channels model the basic communication primitives 
found back in software and hardware description lan- 
guages. 

[0084] In the strict sense only point-to-point chan- 
nels connecting one master to one slave port are 
allowed. However, a person skilled in the art, can easily 
remove this restriction to allow channels connecting two 
master ports or two slave ports, or to allow channels 
connecting multiple slave and master ports. 
[0085] Such an extended description, can be trans- 
formed into the basic model, by using a default or user- 
defined translation scheme. 

[0086] In FIGURE 3, there is a primitive channel 
(35) that connects port p1 (29) of process P1 (25) with 
port p2 (27) of process P23 (26). 
[0087] Communication always happens between 
two threads. Communication between threads that are 
part of the same process is denoted as intra-process 
communication. Communication between threads in dif- 
ferent processes is denoted as inter-process communi- 
cation. 

[0088] Intra-process (inter-thread) communication 
is done by making use of shared variables/signals that 
are declared in the context of the process. Avoiding that 
two threads access the same variable at the same time 
is host language dependent. It is the user's responsibil- 
ity to protect critical sections using the mechanisms pro- 
vided in the host language. 

[0089] In FIGURE 3, intra-process communication 
occurs in process P23 (26). 

[0090] The variable tmp (36) declared in the context 
(37) is shared by slave thread p2 (31) and slave thread 
p3 (32). 

[0091] Inter-process (inter-thread) communication 
with a primitive protocol is RPC based. On a master 
port, the RPCfunction can be used to initiate a thread in 
a remote process. A master port can be accessed from 
anywhere in the host language encapsulation (context, 
autonomous threads, slave threads) with the exception 
of the construct thread. 

[0092] The RPC function returns when the slave 
thread has completed, i.e. when all the statements in 
the slave thread's code are executed. In the slave 
thread (uniquely associated with a slave port), the Read 
and Write functions can be used to access the data of 
the slave port. 

[0093] The Index function is used to access the 
indices of the protocol of the port. The RWbar function 
is used on an inoutslave port to determine the actual 
direction of the data transport. A slave port can only be 
accessed from within its associated slave thread. 
[0094] A bi-directional port can be used to both 
send and receive data. 

[0095] However, according to the strict RPC 
semantics this cannot be done by the same RPC call. In 
a single RPC call, one uses the bi-directional port either 
in the input or in the output direction but not in both 
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directions. For a person skilled in the art, it is easy to 
extend the strict RPC semantics to full fledged function 
call semantics where arguments are passed to a remote 
procedure and results are received back. 
[0096] In FIGURE 3, inter-process communication 
occurs between processes P1 (25) and P23 (26) over 
the channel (35). When the RPC statement (38)in the 
autonomous thread (34) is reached, the value of the 
local variable data (39) is put on the channel (35), and 
the control is transferred to the slave thread p2 (31 ). The 
autonomous thread (34). is halted, until the last state- 
ment of the slave thread (31 ) is executed. After that the 
autonomous thread (34) resumes by executing the 
statement (40) after the RPC statement (38). 
[0097] By using primitive channels, ports, and pro- 
tocols, the designer first concentrates on the functional- 
ity of the system while abstracting from terminals, 
signals, and handshakes. Once the designer is con- 
vinced that the processes of the system are functionally 
correct, the communication behavior of the system can 
be refined. Communication refinement in CoWare is 
carried out by making the objects involved in the com- 
munication (channels, ports, and protocols) hierarchi- 
cal. 

[0098] Hierarchical channels are processes that 
assign a given communication behavior to a primitive 
channel. The behavioral interface of a hierarchical 
channel is fixed by the ports connected by the primitive 
channel. Making a channel hierarchical, can drastically 
change the communication behavior of two connected 
processes. It can, for example, parallelize (pipeline) the 
processes by adding buffers. The one property that is 
preserved by making a channel hierarchical is the direc- 
tion of the data transport. 

[0099] In FIGURE 4, the primitive channel (41) 
between processes Pi (42) and P23 (43) is refined into 
a hierarchical channel (44) with FIFO behavior. The 
FIFO hierarchical channel (44) decouples the autono- 
mous thread of process P1 (42) and the slave thread 
associated with port p2 (45) of process P2 (43). The 
effect is that the rate at which process P1 (42) can issue 
RPCs is no longer determined by the rate at which proc- 
ess P23 (43) can service the RPCs. The FIFO hierarchi- 
cal channel (44) takes care of the necessary buffering of 
data. 

[0100] Hierarchical ports are processes that assign 
a given communication behavior to a primitive port. The 
behavioral interface of the hierarchical port is partially 
fixed by the primitive port it refines. The hierarchical port 
process should have one port, which we call the return 
port, that is compatible with the primitive port. Making a 
primitive port hierarchical, preserves the data direction 
(in/out). Two ports are compatible if their primitive proto- 
cols are compatible, if they have equal data type, and if 
they have equal protocol indices. The following primitive 
protocols are compatible : (master, slave); (inslave, out- 
master); (inslave, inoutmaster); (outslave, inmaster); 
(outslave, inoutmaster); (inoutslave, inmaster); (inout- 



slave, outmaster); (inoutslave, inoutmaster). Two hierar- 
chical protocols are compatible if their primitive 
protocols are compatible and they have the same name. 
[0101] In FIGURE 4 we impose a certain data for- 

5 matting for the data transported over the channel (41 ) 
between process P1 (42) and the FIFO hierarchical 
channel (44). This is achieved by making the primitive 
ports p1 (46) and left (47) hierarchical. The format proc- 
ess (48) that refines port pi (46) might for example add 

w a cyclic redundancy check to the data that is trans- 
ported. The unformat process (49) that refines port left 
(47) of the FIFO hierarchical channel (44) then uses this 
cyclic redundancy check to determine whether the 
received data is valid. The actual data and the cyclic 

15 redundancy check are sent sequentially over the same 
primitive channel between ports op (50) and ip (51). As 
a consequence, the data rate between the format (48) 
and unformat (49) process is twice the one of process 
P1 (42). 

20 [0102] Primitive protocols provide a classification of 
all hierarchical protocols. A primitive protocol deter- 
mines the communication semantics, but not the com- 
munication implementation: it does not fix the timing 
diagram used in the communication. Hierarchical proto- 
ns cols refine primitive protocols with a timing diagram and 
the associated I/O terminals: Hierarchical protocols are 
high level models for alternative implementations of a 
primitive protocol: they preserve both data direction and 
control di rection of th e p rim itive p rotocol. 
30 [0103] To access the terminals of the hierarchical 
protocol, a hierarchical port is introduced at the same 
time. The terminals can be accessed from within the 
thread code by using the functions Put, Sample, and 
Wait 

35 [0104] In FIGURE 4, the primitive protocol of the 
port (50) and ip port (51 ) of the format (48) and unfor- 
mat (49) process are refined into an RS232 protocol 
(52). In the RS232 hierarchical port (53), an RPC (54) 
issued in the format process (48) on the op port (50) is 

40 converted into manipulations of the terminals (55) 
according to a timing diagram (56). 
[0105] The CoWare model is implemented on a 
computer or on a plurality of computers and a set of 
application programmer's interface (API) functions is 

45 available. 

[0106] When a CoWare system description is 
parsed, a representation of the system in memory is 
created in which the objects of the description are 
related to each other. All tools of the CoWare environ - 

so ment use these API functions to analyze, manipulate, 
and refine the system description. 
[0107] Due to-the selection of RPC as inter-process 
communication, the classification of protocols and the 
structuring of a process in encapsulations with context 

55 and threads, a process merge transformation can be 
implemented. 

[01 08] The goal of this transformation is to combine 
a number of process instances, that are described in the 
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same host language, into a single host language encap- 
sulation that can then be mapped by a host language 
compiler onto a single processor. 
[0109] In the process of merging, all remote proce- 
dure calls are in-lined: each slave thread is in-lined in 5 
the code of the thread that calls it through an RPC state- 
ment. Because of the semantics of RPC communica- 
tion, this transformation does not alter the behavior of 
the original system, provided that care is taken to avoid 
name clashes. The result of merging is a single host io 
language encapsulation that contains a single context, 
a single construct thread, one or more autonomous 
threads, possibly multiple slave threads to service RPC 
requests from external process (not involved in the 
process merge transformation), possibly multiple RPCs is 
to slave threads in external processes (not involved in 
the process merge transformation). 
[0110] FIGURE 5 shows the effect of merging the 
process instances (25, 26) in the subsystem process 
(23) of FIGURE 3. The subsystem process (57) has a 20 
CoWare language encapsulation. After merging the 
instances (58, 59), we obtain a C language encapsula- 
tion (60) which is added to the subsystem process. 
[0111] The benefit of merging processes is that the 
in-lining transformation eliminates the overhead that 25 
accompanies execution of (remote) procedure calls. It 
further reduces the number of concurrent threads and, 
therefore, the overhead that accompanies the switching 
between threads. Finally, it allows the host language 
compilers to optimize over the boundaries of the original 30 
processes. 

[0112] The port and protocol hierarchy provides a 
clear separation between functional and communication 
behavior. Traditionally, the description of a component 
contains both functional and communication behavior in 35 
an interleaved way. When such a component has to be 
re-used, in an environment other than it was intended 
for, the designer has to change those parts of the 
description of the component that have to do with com- 
munication. In CoWare, a components behavior is 40 
described by a process that makes use of RPC to com- 
. municate with the outside world. Such processes can 
be connected with each other without modifying their 
description (modularity). By using primitive ports and 
primitive protocols, the designer concentrates on the 45 
functionality of the system while abstracting from termi- 
nals, signals, and handshakes. 

[0113] Later, when the component is instantiated in 
a system, the primitive protocol is refined into the best 
suited hierarchical protocol, taking into account the 50 
other system components involved. This fixes the timing 
diagram and terminals used to communicate over that 
port. The port containing the hierarchical protocol, is 
made hierarchical to add the required communication 
behavior that implements the timing diagram of the 55 
selected hierarchical protocol. Again this is achieved 
without modifying the description of any of the proc- 
esses involved. 



[01 1 4) Because of this property it is feasible to con- 
struct libraries of functional building blocks and libraries 
of communication blocks that are re-usable: they can be 
plugged together without modifying their description. 
After blocks have been plugged together, any communi- 
cation overhead (chains of remote procedure calls) can 
be removed by in-lining the slave threads that serve the 
RPCs. The result is a description of the component in 
which function and communication are interleaved 
seamlessly and which can be compiled into software or 
hardware as efficiently as a description in the traditional 
design process. 

[0115] The above method reduces the amount of 
protocol conversions needed at the system level and 
allows to postpone the selection of the communication 
protocol and its implementation until late in the design 
process, in this way achieving the requirements of 
"design for re-use". The concept of hierarchical proto- 
cols is also useful to model off-the-shelf components 
("re-use of designs"), because the timing diagrams 
according to which a processor communicates are 
abstracted in it. 

[0116] The input to the implementation refinement 
process is a functional specification: a CoWare lan- 
guage encapsulation consisting of a number of process 
instances (i.e. host language encapsulations), exhibit- 
ing both intra-process and inter-process communication 
behavior. In a first step, allocation is performed In this 
step the number and type of processors are selected 
that will serve as the target for implementing the input 
specification. After allocating the necessary processor 
resources, an assignment step is performed. In this step 
each process instance of the input specification is 
assigned to one of the allocated processors. 
[0117] The rest of the implementation path is illus- 
trated in FIGURE 6. 

[01 1 8] All process instances bound to a single proc- 
essor are merged. This results in a system with a one- 
to-one mapping between (merged) processes and allo- 
cated processors. In FIGURE 6, all DFL processes (61) 
are merged (62) into a single DFL process (63). 
[0119] The host language encapsulation from each 
(merged) process instance now has to be compiled onto 
its processor target using a host language compiler. 
This comprises the following steps: 

(1) The CoWare concepts of autonomous thread, 
slave thread, and shared context are implemented 
on the processor target 

(2) The inter-process communication is imple- 
mented. 

(3) The resulting processors are encapsulated so 
that they can be connected with the rest of the sys- 
tem. 

[0120] In step (1) existing (commercial) host lan- 
guage compilers are re-used. When such a host lan- 
guage compiler does not directly support the CoWare 
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concepts of autonomous thread, slave thread, and 
shared context, the CoWare environment supports two 
alternatives: 

- the host language compiler is extended with librar- 
ies that support such concepts (multi-thread 
library);. 

- software synthesis is performed to translate the 
host language encapsulation to a description that 
can be handled by the host language compiler. 

[0121J In FIGURE 6, the DFL process (63) is com- 
piled with the Cathedral compiler (64). The result is a 
VHDL net list (65) of the implementation. 
[0122] In step (2) when the process is compiled on 
a non-programmable processor, the implementation of 
inter-process communication comprises the steps of 
refining the primitive ports/protocols into appropriate 
hierarchical ports/protocols and merging the hierarchi- 
cal ports with the process. 

[0123] In FIGURE 6, hierarchical ports (66) are 
added to the VHDL processes (67). After merging (68) 
all processes (67), the resulting VHDL process (69) is 
compiled with the Synopsys compiler (70). 
[0124] In step (2) when the process is compiled on 
a programmable processor, the implementation of inter- 
process communication comprises the steps of gener- 
ating device drivers and of generating hardware inter- 
faces. A software tool is used to achieve this. In the 
sequel this tool is called SYMPHONY FIGURE 7 illus- 
trates the tool SYMPHONY 

[0125] SYMPHONY makes use of a software model 
(71) and hardware model (72) of the target processor 
and a library of I/O scenarios (73) for that processor. 
The hardware model (72) of the programmable proces- 
sor core consists of a HDL host-language encapsulation 
that formalizes the information that is available in the 
hardware section of the data sheet of the programmable 
processor core. The HDL host-language encapsulation 
of the hardware model is characterized by-a behavioral 
interface that is conform with the hardware interface of 
the programmable processor. All ports (74) have hierar- 
chical protocols: they consist of terminals and a timing 
diagram. The hardware model may also contain a HDL 
description for either a black box, a simulation model, or 
the full description of the processor core. 
[0126] The software model (71 ) of the programma- 
ble processor core consists of a software host-language 
encapsulation that formalizes the information that is 
available in the software section of the data sheet of the 
programmable processor. The software host-language 
encapsulation of the software model is characterized by 
a behavioral interface that is conform with the software 
interface of the programmable processor core. All ports 
(75) have primitive protocols. The software model iden- 
tifies, for example, what ports can be used to get data in 
or out of the processor core (memory mapped, co-proc- 
essor port,...), what ports can be used as interrupt ports 



and what their characteristics are (interrupt priority, 
maskable interrupt,...). In addition the software model 
contains a behavioral description that allows to compile 
a software host language encapsulation into machine 

5 code. For example: functions to manage processor spe- 
cific actions such as installing an interrupt vector, ena- 
bling/disabling interrupts, etc. In Figure 7, the software 
model of the ARM-6 RISC processor is shown. A 
number of its ports (75) are shown in Figure 7. The 

70 memory port mem is modeled as a (bi-directional) slave 
port. This slave port is accessed by the device drivers, 
by means of an RPC, to write/read data to/from the 
external hardware. The slave thread, modeled in the 
software model, attached to the mem slave port trans- 

75 lates the incoming RPC to a memory access. SYM- 
PHONY makes the connection between the device 
drivers and the mem port. The fiq port is modeled as a 
master port. The software model of the ARM processor 
ensures that an RPC to this port is performed, every 

20 time the processor detects that an interrupt has 
occurred. SYMPHONY connects the fiq port to a slave 
thread that serves as the interrupt service routine, so 
that routine is started automatically. 
[0127] In figure 7, the hardware model of the ARM- 

25 6 RISC processor is shown. A number of its ports (74) 
are shown in Figure 7. The memory port mem is mod- 
eled as a (bi-directional) master port. The hardware 
model of the ARM-performs an RPC to port every time 
that it wants to write/read data from the RAM, ROM or 

30 memory mapped hardware. SYMPHONY connects the 
mem port to a slave thread in the hardware interface, 
which does the address decoding and forwards the 
RPC to the appropriate hardware block (RAM, ROM, 
memory mapped hardware). 

35 [01 28] The fiq port is modeled as a slave port in the 
hardware model. This slave port is activated by an RPC 
that is performed to the port by the hardware interface. 
The slave thread attached to the fiq port (and modelled 
in the hardware model) sets the appropriate flag in the 

40 status register to the appropriate value, signalling the 
interrupt request. 

[0129] The link between events . (RPC to ports, 
starting of slave threads attached to slave ports) in the 
hardware model and events in the software model, is 

45 taking care of by the processor hardware or, in case of 
simulation, by the instruction -set simulator for the ARM. 
[0130] SYMPHONY is based on the observation 
that programmable processors have a number of com- 
mon communication methods to get data in or out of the 

so processors. These communication methods are mod- 
eled by I/O scenarios. An I/O scenario describes one 
way of using the ports of a specific processor core to 
map a particular port of a software host language 
encapsulation to an equivalent port in hardware, 

55 thereby crossing the processor core boundary while 
maintaining the communication semantics. FIGURE 8 
shows an example of an I/O scenario. It consists of a 
software host-language encapsulation and a hardware 
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host-language encapsulation that describe a software 
I/O driver and the hardware counterpart, respectively. 
An I/O scenario is also tagged with some performance 
figures that will allow the designer or SYMPHONY to 
make a decision about what I/O scenario to use for 
which port. 

[0131] The I/O scenario (78) of FIGURE 8 shows 
how an outmaster port psw (79) in software can be 
mapped to an outmaster port phw (80) in hardware, 
thereby using the memory port (81) of an ARM-6 RISC 
processor core. The software process encapsulation Pi 
(82) represents the software I/O driver and copies data 
from port psw (79) to a specific memory address 
0x08000 via an RPC call (84). The hardware process 
encapsulation P2 (83) represents the hardware coun- 
terpart and checks whether the memory address bus of 
the ARM (modeled by the protocol indices of the mem- 
ory port (81)) equals 0x08000. If this is the case, data 
that is residing on the memory data bus of the ARM is 
copied to port phw (80) via an RPC call (85). 
[0132] The library of I/O scenarios comprises: 

Memory-Mapped I/O scenarios. These provide a 
data-transfer mechanism that is convenient 
because it does not require the use of special proc- 
essor instructions, and can implement practically as 
many input or output ports as desired. In memory- 
mapped I/O, portions of the address space are 
assigned to input and output ports. Reads and 
writes to those addresses are interpreted as com- 
mands to the I/O ports. "Sending" to a memory- 
mapped location involves effectively executing a 
"Store" instruction on a pseudo-memory location 
connected to an output port, and "Receiving" from a 
memory-mapped location involves effectively exe- 
cuting a "Load" instruction on a pseudo-memory 
location connected to an input port. When these 
memory operations are executed on the portions of 
address space assigned to memory-mapped I/O, 
the memory system ignores the operation. The I/O 
unit, however, sees the operation and performs the 
corresponding operation to the connected I/O 
ports. The number of memory locations assigned 
for memory-mapped I/O will depend on the number 
of ports that a software processor component has 
to "physically" implement. SYMPHONY proposes 
an assignment of address locations to channels 
that wilt result in simple address decoding logic. 
However, the user can always override the pro- 
posed assignment. 

Instruction -Programmed I/O scenarios. Some proc- 
essors also provide special instructions for access- 
ing special I/O ports provided with the processor 
itself. Using this scheme, these special communica- 
tion ports of the processor are connected to the 
external channels via the I/O unit. In addition to pro- 
viding hardware support for memory-mapped and 
instruction-programmed I/O, the I/O unit also pro- 



vides support for hardware interrupt control. Inter- 
rupts are used for different purposes, including the 
coordination of interrupt-d riven I/O transfers. Differ- 
ent processors provide different degree of hard- 

5 ware interrupt support. Someprocessors provide 
direct access to a number of dedicated interrupt 
signals. Our I/O unit architecture makes use of 
these signals when available. If more interrupt 
"channels" are required, as for example required to 

to support a number of interrupt-d riven communica- 
tion channels, we use the strategy of interrupt vec- 
tors. Interrupt vectors are pointersor addresses that 
tell the processor core where to jump to for the 
interrupt service routine. In effect, this is a kind of 

75 memory-mapped interrupt handling. 

Once an I/O scenario is selected for every port of 
the software host-language encapsulation, SYM- 
PHONY generates the necessary communication 
software and the corresponding hardware I/O unit 

20 by combining the selected I/O scenarios. The gen- 
erated communication software, the software 
model of the processor core and the software host- 
language encapsulation itself are merged and com- 
piled with the processor specific C-compiler. 

25 The result of SYMPHONY is a refinement of the 
original host language encapsulation into a CoW- 
are encapsulation of which the behavioral interface 
is identical to that of the original encapsulation. 

30 [0133] In addition SYMPHONY, adds RAM and 
ROM blocks to store the program code and data (In fig- 
ure 7, the RAM and ROM are not shown explicitly : they 
are part of the HW interface). The result of SYM PHONY 
is a refinement of the original host language encapsula- 

35 tion into a CoWare encapsulation of which the behavio- 
ral interface is identical to that of the original 
encapsulation. SYMPHONY effectively replaces a soft- 
ware encapsulation by a hardware encapsulation that 
has equivalent functionality. 

40 [0134] In step (3) when two processors are not pro- 
tocol compatible, a protocol conversion process is 
inserted. In FIGURE 6, the processor (65) compiled 
with Cathedral-2/3 and the off-the-shelf processor (76) 
have incompatible protocols. Protocol conversion (77) is 

45 required to make them compatible. 

[0135] A digital system in the CoWare design envi- 
ronment can be simulated. Simulation is an implemen- 
tation of the digital system on one or more general- 
purpose computers. The implementation process out- 

so lined above can be followed to construct a simulation. 
FIGURE 9 illustrates the construction of a simulation. 
For simulation the target processors are simulators (86) 
running on processes (87) of the operating systems (88) 
that run on the general-purpose computers. Allocation 

55 and assignment determine the simulation architecture. 
Arbitrary simulation architectures are supported by the 
CoWare design environment. Support is provided to 
select an optimal architecture for a given simulation 
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speed and debugging visibility. 
[0136] The host language compilers mentioned in 
step (1) are now the simulation engines for the host lan- 
guages. 

[0137] In step (2), the inter-process communication 5 
now consists of two parts. In a first part the communica- 
tion is realized from the simulation engine to the OS 
process on which the simulation engine is running. In a 
second part the communication is realized between two 
different OS processes over the OS and network layer. 10 
The communication between the simulation engine and 
the OS process is performed via the application pro- 
grammers interface of the simulation engine. The com- 
munication between two different OS processes is done 
through the OS inter-process communication primitives is 
(e.g. shared memory and semaphores for two proc- 
esses on a single OS, or TCP/IP sockets for two proc- 
esses on distinct computers). 

[01 38] When the simulation engine used has a fixed 
interface as for example an instruction set simulator for 20 
a programmable processor, then the hardware software 
interface is generated with SYMPHONY and can be 
simulated as any other process. 
[0139] The CoWare design environment supports 
multi abstraction level simulation which is the key for 25 
efficient co-simulation. It allows to simulate the proc- 
esses under debug at an appropriate low level of 
abstraction for debugging purposes, while simulating 
the other processes in the system at the highest appro- 
priate abstraction level for maximal speed. The time- 30 
consuming low abstraction level simulation is limited to 
the smallest possible part of the system under simula- 
tion, while still being able to simulate these parts in the 
system context. 

[0140] Because both simulation and implementa- 35 
tion follow the same design process, it is possible to 
construct hybrid simulation architectures in which part 
of the system is implemented by simulators running on 
OS processes and part of the system is implemented by 
actual hardware. This is just one more manifestation of 40 
the heterogeneity of digital system architectures. 
[0141] In a specific embodiment of the present 
invention an application is disclosed in the sequel for 
hardware/software co-design of a pager application. 

45 

SPECIFICATION OF THE PAGER 

[0142] Each block (89) in FIGURE 1 0 corresponds 
to a process implementing a specific function of the 
pager. This functional decomposition determines the 50 
initial partitioning. The arrows (90)in between the proc- 
esses represent primitive channels with RPC seman- 
tics. 

[0143] FIGURE 1 1 shows the RPC communication 
in detail for part of the pager design. The blocks (91 , 92, 55 
93, 94, 95, 96,97, 98, 99, 100, 101) in FIGURE 11 cor- 
respond to the processes (89) from FIGURE 1 0. 
[0144] The Sample Clock Generator process (94) 



contains an autonomous thread (102). This thread runs 
continuously. It performs an RPC (103) over its input 
port ip (104) to the Tracking & Acquisition process (93) 
to obtain a new value for delta. The autonomous thread 
(102) of the process (94) adds the delta parameter to 
some internal variable until a threshold is exceeded. In 
this way it implements a sawtooth function. When the 
sawtooth exceeds the (fixed) threshold an RPC call 
(105)is issued to the A/D converter process (95). The 
autonomous thread (1 02) of the Sample Clock Genera- 
tor (94) performs an RPC (105) (gives a sample clock 
tick) every threshold/delta iterations (real clock cycles): 
[0145] The slave thread clock (1 06) in the A/D con- 
verter process (95) samples the analogue input, and 
sends the result to the Down-conversion process (1 00) 
via an RPC call (107). This in turn will activate the Dec- 
imation process (99) via an RPC call, etc. 
[0146] The Correlator Noise Estimator process (98) 
contains a slave thread (108) associated with port ip 
(109) to compute the correlation values. This slave 
thread (108) is activated when the Phase Correction 
process (97) writes data to the Correlator Noise Estima- 
tor process (98) (i.e. when the Phase Correction proc- 
ess (97) performs an RPC (110) to the ip (109) port of 
the Correlator Noise Estimator process (98)). The slave 
thread (108) reads in the data and then performs an 
RPC (1 1 1 ) to the User Interface process (91 ) to obtain a 
new value for the parameter par it requires for comput- 
ing the correlation values. Finally, the new correlation 
results are sent to the Tracking Acquisition process (93) 
via an RPC call (112) on its op port (113) ; 
[0147] The slave thread (114) in the Tracking Acqui- 
sition process (93) updates the delta value for the saw- 
tooth function implemented by the Sample Clock 
Generator process (94). it puts the updated value in the 
context (115), where it is retrieved by the slave thread 
op (116) which serves RPC requests from the Sample 
Clock Generator process (94). In this way the Tracking 
Acquisition process (93) influences the frequency of the 
clock generated by the Sample Clock Generator proc- 
ess (94). This example shows how the context (1 15) is 
used for communication between threads inside the 
same process whereas the RPC mechanism is used for 
communication between threads in different processes. 
The locking (117) and unlocking (118) of the context 

(115) is required to avoid concurrent accesses to the 
variable delta. The lock (117) in the slave thread op 

(1 16) locks the context (1 15) for read: other threads are 
still allowed to read from the context (115), but no other 
thread may write the context (115). The lock (1 1 9) in the 
slave thread ip (1 1 4) locks the context (1 1 5) for write: no 
other thread is allowed to write or read the context (1 1 5) 
until it is unlocked again. 

[0148] Each process is described in the language 
that is best fit for the characteristics of the function it 
implements The data-flow blocks (NCO (101), Down- 
conversion (100), Decimation (99), Chip Matched Filter 
(96), Phase Correction (97), Correlator Noise Estimator 
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(98), and Sample Clock Generator (94)) are described 
in DFL. The control oriented blocks (Tracking Acquisi- 
tion (93), Frame Extraction (92) and User Interface (91)) 
are described in C. The code in FIGURE 11, is pseudo- 
code meant for illustration and does not correspond to 
the actual code. 

DESIGN PROCESS 

[0149] After the initial specification of the system 
has been validated by simulation, the designer starts 
the refinement process. 

[0150] At this moment it is not yet decided what 
process will be implemented on what kind of target 
processor nor is it defined how the RPC communication 
will be refined. However, the choice of the specification 
language for each process restricts the choice of the 
component compiler and in that sense partly deter* 
mines the target processor. Hence, studying possible 
alternative assignments of a process to a target proces- 
sor may require the availability of a description of the 
process in more than one specification language or a 
clear guess of the best solution. 

ALLOCATION AND ASSIGNMENT 

[0151] This step determines what processes will be 
implemented on what target processor. The initial spec- 
ification shows the finest grain partitioning: a process in 
the initial specification will never be split over several 
processors. However, it may be worthwhile to combine 
a number of processes inside a single processor. This is 
achieved by merging these processes into a single 
process that can then be mapped on the selected target 
processor by a host language compiler. Merging of 
processes is only allowed when the processes are 
described in the same specification language. Hence, 
studying possible alternative mergers may require that 
for a number of processes (e.g. Correlator Noise Esti- 
mator process) a description is available in more than 
one specification- language. After allocation and assign- 
ment, one obtains a description with a one-to-one map- 
ping of merged processes to processors. 
[0152] In the pager example (FIGURE 12) the fol- 
lowing allocation, merging and assignment takes place. 
[0153] The NCO (120), Down -conversion (121), 
and Decimation (122) processes are merged and 
mapped in hardware onto an application specific DSP 
processor (123) because the sample rate of the merged 
processes is identical which implies that they can be 
clocked at the same frequency. The advantage is that 
only one clock tree needs to be generated per merged 
process (i.s.o. one per original process). An additional 
advantage is that the scan-chains for the processes that 
are merged can be combined. 
[0154] The Chip Matched Filter (124), and Phase 
Correction (125) processes are merged and mapped 
onto a CATHEDRAL-3 processor (126) because their 



sample rates are identical. 

[0155] The Correlator Noise Estimator process 

(127) is mapped onto a CATHEDRAL-3 processor 

(128) . It is not merged with the Phase Correction proc- 
5 ess (125) because it operates at a four times lower fre- 
quency. 

[0156] The Sample clock generator (129) is 
mapped onto a CATHEDRAL-3 processor (130). 
[0157] Tracking Acquisition (131), Frame Extraction 

to (132), and User Interface (133) are merged and 
mapped on a programmable processor (134). For this 
design an ARMS processor is chosen. 
[0158] The Hardware/Software tradeoffs are based 
on the following observations. To obtain a maximal 

15 degree of flexibility as much of the functionality as pos- 
sible is implemented in software on the ARM6 (134). 
However, due to performance constraints of the ARM6 
processor (134), there is a limit to what can be imple- 
mented in software. The two main factors that play a 

20 role in this problem are The Tracking Acquisition proc- 
ess (131) has to be implemented in software because 
the algorithm used to perform tracking and acquisition 
may be modified depending on the application domain 
of the pager system. 

25 [0159] The Correlator Noise Estimator process 
(127) is not included in software because the input rate 
for the Correlator Noise Estimator (127) is too high to 
realize a real-time communication between the ARM6 
and the Phase Correction process (125). In addition an 

30 estimation of the number of cycles required to execute 
each function on the ARM6 shows that the implementa- 
tion of Correlator Noise Estimator process (1 27) in soft- 
ware leaves insufficient time to perform tracking and 
acquisition in between every two symbols. 

35 [0160] After merging, each of the merged proc- 
esses can now be implemented on a separate target 
processor by the appropriate compiler. The communica- 
tion between the merged processes is still done via 
primitive ports and channels. 

40 

Communication Mechanism Selection 

[01 61 ] After the partitioning of the system has been 
verified by simulation and before the actual implementa- 

45 tion takes place, the designer may choose to refine the 
communication mechanism between the processors. 
This can be achieved by making explicit the behavior of 
the channels between the processors. 
[0162] In the running example, the processors can, 

so in principle, operate concurrently because each proces- 
sor has its own thread of control. By refining the RPC 
based communication scheme we can pipeline the 
processors: all processors operate concurrently and at 
I/O points they synchronize. This refined communica- 

55 tion scheme is called Blocked/Un Blocked Read/Write 
communication. FIGURE 13 shows the pager with the 
refined communication mechanism. The inputs and out- 
puts of the processors have been labeled with BW for 
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Blocked Write, BR for Blocked Read, and UBR for 
UnBlocked Read. 

[0163] BW-BR communication guarantees that no 
data is ever lost. When the writing process has data 
available, it will signal that to the reading process. If the 5 
reading process is at that moment not ready to receive 
the data (because it is still processing the previous 
data), the writing process will block until the reading 
process is ready to communicate. Alternatively, if the 
reading process needs new data, it will signal that to the 10 
writing process. If the writing process is at that moment 
not ready to send the data (because it is still computing 
the data), the reading process will block until the writing 
process is ready to communicate. The BW-BR scheme 
is used in the main signal path. 15 
[0164] A BW-BR scheme, however, is not used for 
the parameter and mode setting for the main signal 
path. If an accelerator uses BR to read a parameter 
value it will be blocked until the parameter is provided. 
Since the parameter setting is done in software, this will 20 
slow down the computations in the main signal path 
considerably. Therefore parameter setting is done via a 
BW-UBR scheme. This makes sure that every parame- 
ter change is read by the accelerators, but it leaves it up 
to the accelerator to decide when to read the parameter. 25 
[0165] In the CoWare design environment the 
refinement of the communication mechanism is per- 
formed by making use of a hierarchical channel. A hier- 
archical channel replaces a primitive channel by a 
process that describes how communication over that 30 
channel is carried out. 

[0166] The introduction of BW-BR communication 
is shown in detail for the Chip Matched Filter & Phase 
Correction (1 35) and Correlator & Noise Estimator proc- 
ess (136) in FIGURE 14. 35 
[0167] The BWBR channel (137) contains a auton- 
omous thread (138) and a slave thread (139) that com- 
municate with each other via the shared variable 
tmp[0..7] in the context (140). The slave thread (139) is 
activated by an RPC (141) from the CMF & Phase Cor- 40 
rection process (135) and it tries to update the context 
(140) with new values. The autonomous thread (138) 
continuously tries to read the values from the context 
(140) and send them to the output port (144) via an 
RPC (142), in this way activating the Correlation & 45 
Noise Estimator process (1 36) that is attached to that 
output (144). The blocking character of the communica- 
tion is taken care of by the use of a binary semaphore 
rw ( 1 43). This guarantees that the input thread ( 1 39) will 
block until the previous data has been read by the 50 
autonomous thread (138) (no data is overwritten before 
it has been read), and that the autonomous thread (138) 
will block until new data is available (no data is read 
twice). When the input slave thread (1 39) is blocked, the 
CMF & Phase Correction process (1 35) that requested 55 
its service via an RPC (141) is also blocked because the 
RPC (141) will only return after the slave thread (139) 
has completed. When the autonomous thread (138) is 



blocked, there are no RPC requests to the Correlator 
Noise Estimator process (1 36), so that process (136) is 
blocked automatically. 

[0168] In the case of Blocked-Write, UnBlocked 
Read communication, the code for the autonomous 
thread is slightly modified. The thread always sends the 
value stored in the context, without checking whether it 
is updated. The same value can be sent more than 
once, but the thread will never be blocked. The input 
slave thread is identical to the BWBR case, and will 
block until the data has been read. 
[0169] In both cases, locking and unlocking of the 
context is required to avoid concurrent accesses to the 
shared variable in the context and, as such, has nothing 
to do with the blocking character of the communication. 

Implementation of the pager 

[0170] After the newly introduced communication 
mechanism has been verified by simulation, each proc- 
ess has to be synthesized on its assigned target proces- 
sor. 

implementation of a Process in Hardware 

[0171] FIGURE 15 illustrates the pure hardware 
implementation for the Correlator & Noise Estimator 
process (145) and the merged Phase Correction and 
Chip Matched Filter process (146). 
[0172] This hardware implementation for the pager 
consists of three distinct steps: 
[0173] The (merged) DFL processes are synthe- 
sized by the CATHEDRAL silicon compiler. The com- 
piler generates processors of which all the inputs and 
outputs are of the master type. These processors are 
shown in FIGURE 15 as the inner rectangles (147, 148). 
[0174] Each processor is encapsulated to make it 
consistent with the specification in which the DFL proc- 
esses have slave inputs. In addition, the encapsulation 
includes clock gating circuitry to control the activity of 
the processor. The encapsulated processors are shown 
in FIGURE 15 as the big rectangles (149, 150): they 
include the processor generated by CATHEDRAL (147, 
148) and some encapsulation hardware (151, 152, 
153). As can be observed the input ports (154, 155) of 
the encapsulated processors (149, 150) are now of the 
slave type. The encapsulation hardware (151, 152, 153) 
is shown in detail in FIGURE 16 as the blocks (155, 156, 
157). 

[0175] The BWBR process is implemented in hard- 
ware. In this case we obtain the gate-level implementa- 
tion of this process from the library. This implementation 
is functionally equivalent to the original C-like descrip- 
tion (137) of this block in FIGURE 14. FIGURE 16 
shows the detailed implementation (158) of the BWBR 
process that is used in the main signal path of the pager. 
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Implementation of a Process in Software 

[0176] To simplify the discussion we will only look at 
the transfer of the 14 correlation values to the Tracking 
Acquisition process, and the setting of a parameter 5 
value by the User Interface. We also know that the 
transfer of the correlation values has to be BWBR and 
the transfer of the parameter value has to be BWUBR. 
[0177] The hardware interface and the software I/O 
device driver is generated automatically with SYM- io 
PHONY. To generate these interfaces SYMPHONY 
analyses the ports of the software process. For each of 
these ports, SYMPHONY scans the library of I/O sce- 
narios for an applicable scenario. The user is asked to 
select the most appropriate scenario amongst the appli- 75 
cable ones. SYMPHONY then combines the selected 
scenarios into a software I/O device driver and a hard- 
ware interface. 

[0178] In the example (FIGURE 17) there are two 
ports to be implemented: 20 

bool[32][14] corn inslave (159) is used to transfer 
the correlation values. This is a port of type inslave, 
that transports an array of 1 4 bit vectors of size 32. 
bool par: outmaster (160) is used to set a parame- 25 
ter of the correlation block. This is a port of type out- 
master, that transports a boolean value. 

[0179] For the corr port (159) SYMPHONY pro- 
poses the scenario depicted in FIGURE 17. The mem- 30 
ory port of the ARM will be used to transfer the 
correlation values and the FIQ port of the processor will 
be used to initiate the transfer. The I/O scenario 
describes what blocks need to be inserted in software 
and hardware to realize this kind of communication. In 35 
total three hardware and three software blocks are 
required to implement the communication over the corr 
port. Unpack (161). The memory port of the ARM is 
obviously not wide enough to transfer the 14 correlation 
values in parallel. Therefore, the scenario will sequen- 40 
tialize the transfer. Of the 14 correlation values 13 will 
be stored internally in the Unpack block (161). The 14th 
value is sent to the Split block (162). 
[0180] Split (162) stores the 14th value internally 
and then activates the FIQ port of the ARM processor. 45 
Activating the FIQ port (163) of the hardware model 
(164), has as a consequence that an RPC is issued on 
the interrupt port (1 65) of the software model (1 66). This 
port (165) is connected with the Join (167) block. 
[0181] Join (167) retrieves the 14th correlation so 
value by issuing an RPC to the corresponding input port 
(1 68) of the Demux block (169). Demux (1 69). The data 
transfer is implemented through memory mapped I/O. 
Therefore, when selecting this I/O scenario, the user 
should decide on the address that will be used for the 55 
transfer. When one of the input ports of the Demux 
block is activated an RPC to the memory port (1 70) will 
be performed with an address that corresponds to the 



activated input port. 

[0182] Mux (171). At the hardware side the memory 
port (172) issues an RPC to the Mux (171) block when- 
ever it (1 72) is activated. In that block (1 71 ), the address 
will be decoded and the corresponding output port will 
be activated to retrieve the correlation value that was 
stored locally in hardware (either in the Split (162) or in 
the Unpack block (161)) Pack (173). After the 14th cor- 
relation value has been retrieved by the Join block 
(167), It is passed on to the Pack block (173), that will 
then retrieve the 13 other correlation values by issuing 
consecutive RPCs to the different ports of the Mux block 
(169). Finally, when the Pack block (173) has retrieved 
all 14 values, it packs them in an array and activates the 
original software application code in the Tracking & 
Acquisition process (174). 

[0183] All these blocks are described in a generic 
way in a library where they can be retrieved and cus- 
tomized by SYMPHONY. 

[0184] The solution for the par port (160) is much 
simpler. Since it is an outmaster port it can directly be 
mapped on the memory port. However, since the mem- 
ory port is already used, an extra multiplexer (175) is 
required. This is shown in FIGURE 17. To implement 
the unblocking read character of the transfer an extra 
register (176) is required on the hardware side. 
[0185] Before going on with the implementation 
path, all processes that were added by SYMPHONY are 
merged. The hardware interfaces are merged into one 
hardware interface block that can then be implemented 
with RT-level synthesis tools. The I/O device driver proc- 
esses are merged with the original SW application 
code. As a consequence of the in-lining, the complete 
tracking and acquisition slave thread moves in the inter- 
rupt routine. Whenever new correlation values are 
ready, the main software thread is interrupted to run the 
tracking and acquisition algorithm. After that interrupt is 
processed, the main thread resumes. 
[0186] In the above description a design environ- 
ment and a design methodology meeting the require- 
ments of modularity, encapsulation of different 
description languages, modeling from a heterogeneous 
conceptual specification to a resulting heterogeneous 
architecture and all refinement steps inbetween, mode- 
ling capabilities for off-the-shelf components and the 
associated design environments, separation between 
functional and communication behavior and processor 
independent interface synthesis have been disclosed. 
Yet it is apparent that other embodiments of the present 
invention may be obvious to the person skilled in the art, 
the spirit and scope of the present invention being lim- 
ited only by the terms of the appended claims. 

Claims 

1. A design apparatus compiled on a computer envi- 
ronment for generating from a heterogeneous 
specification a heterogeneous implementation of a 
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system, said system comprising at least one digital 
system part, said heterogeneous specification 
comprising a number of paradigms having associ- 
ated behavioural and structural languages, said 
heterogeneous implementation comprising hard- 5 
ware parts and software parts, said apparatus com- 
prising: 

a database compiled on said computer envi- 
ronment, adapted tor access by executable 10 
programs on said computer environment for 
generating the implementation of said system, 
said database comprising a plurality of objects 
representing aspects of said system wherein 
said objects comprise primitive objects repre- 15 
senting the specification of said system and 
hierarchical objects being created by said exe- 
cutable programs while generating the imple- 
mentation of said system, said hierarchical 
objects being refinements of said primitive 20 
objects and having more detail and preserving 
any one or all of said aspects to thereby gener- 
ate said implementation of said system; and 
said database further comprising relations in- 
between said primitive objects and in-between 25 
said hierarchical objects and between said 
primitive objects and said hierarchical objects; 
and further comprising functions for manipulat- 
ing said objects and said relations; 
- means for simulating said system comprising a 30 
plurality of simulators for said behavioural and 
structural languages; 

means for implementing said system compris- 
ing a plurality of compilers for said behavioural 
and structural languages; means for allocating 35 
hardware components for an implementation of 
said system; 

means for assigning hardware subsystems and 
software subsystems to allocated hardware 
components; 40 
means for implementing the communication 
between said software subsystems and said 
hardware subsystems, one of the aspects of 
said communication being represented by 
ports; characterized in that said apparatus fur- 45 
ther comprises: 

means for creating processor models of said 
allocated hardware components as primitive 
objects in said database, said models compris- 
ing software models representing the software so 
views on said hardware components and hard- 
ware models representing the hardware views 
on said hardware components, 
means for specifying said system in said appa- 
ratus, the specification in said apparatus com- 55 
prising processes being primitive objects and 
comprising ports, said ports structuring the 
communication between said processes and 



being primitive objects, said process compris- 
ing a description of each of the different parts 
of said system in a specific host language 
encapsulation, said encapsulation comprising 
threads and context a plurality of behavioural 
and structural languages being supported in 
said apparatus, and 

means tor encapsulating said simulators, said 
compilers, said hardware components, said 
hardware subsystems and said software sub- 
systems in said apparatus to thereby create a 
consistent communication between said 
encapsulation and said encapsulated simula- 
tors, compilers, hardware components, hard- 
ware subsystems and software subsystems, 
thereby enabling the refinement process of 
said primitive objects while generating the 
implementation of said system. 

2. The design apparatus as recited in claim 1 further 
comprising means for creating I/O scenario models 
of said ports as objects in said database, said I/O 
scenario models representing the implementation 
of said ports on said hardware components, said 
implementation comprising said software subsys- 
tems, said hardware subsystems, and processor 
models with connections therebetween. 

3. The design apparatus as recited in claim 2 wherein 
the implementation of the communication between 
a first software subsystem and a first hardware sub- 
system results in said first software subsystem with 
a first port being replaced by a second hardware 
subsystem with a second port, said first port and 
said second port representing an essentially identi- 
cal communication. 

4. The design apparatus as recited in claim 3 further 
comprising: 

means for selecting I/O scenario models for the 
ports of said first software subsystem; 
means for combining the software subsystems 
of the selected I/O scenarios; 
means for combining the hardware subsystems 
of the selected I/O scenarios. 

5. The design apparatus as recited in claim 4 wherein 
a first I/O scenario model represents the connec- 
tion of said first port to said second port, said con- 
nection comprising a connection of said first port to 
said software subsystems of said I/O scenario 
model, connections of said software subsystems of 
said I/O scenario model to said software model, 
connections of said hardware model to said hard- 
ware subsystems of said I/O scenario model, and a 
connection of said hardware subsystems of said I/O 
scenario model to said second port. 
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6. The design apparatus as recited in claim 5 wherein 
I/O scenario models comprise memory mapped I/O 
scenarios, instruction programmed I/O scenarios, 
interrupt based I/O scenarios. 

5 

7. The design apparatus as recited in claim 1 wherein 
said implementation is a simulation of said digital 
system. 

8. The design apparatus as recited in claim 7 wherein 10 
said simulation is a multi-platform simulation being 
executed on a plurality of computers. 

9. The design apparatus as recited in claim 8 wherein 
said simulation is a hybrid simulation comprising is 
substantially simultaneous hardware implementa- 
tions and computer simulations. 

10. The design apparatus as recited in claim 1 wherein 
said software subsystems are executed on one or 20 
more of said hardware subsystems. 

11. The design apparatus as recited in claim 10 
wherein said hardware subsystems comprise any 
one or more of processor cores, off-the-shelf com- 25 
ponents, custom components, ASICs, processors, 

or boards. 

12. The design apparatus as recited in claim 1 wherein 
said aspects are functional or communication or 30 
concurrency or structural aspects of said digital 
system. 

1 3. The design apparatus as recited in claim 1 , wherein 
said refinements are individual refinements. 35 
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